Item response theory modeling of divergent thinking fluency scores in a Bayesian regression framework

Many opportunities, and a few challenges

Nils Myszkowski

Department of Psychology, Pace University

March 10, 2025

Modeling fluency scores

Fluency scores

In divergent thinking tasks, fluency scores are the count of unique responses provided by a respondent.

They come with unique psychometric challenges…

Non-normal distributions

Non-linear item responses and heteroscedasticity

…and in certain cases

  • Different types of prompts (alternate uses, stories, etc.)
  • Possible violations of measurement invariance (e.g., “uses of a knife” may be an easier prompt for a chef)
  • Possibly different time constraints per task

…item response theory allows these flexibilities !

The 2-parameter Poisson counts model (2PPCM)

According to the 2PPCM (Myszkowski & Storme, 2021), the fluency score \(X_{ij}\) of person \(i\) for item \(j\) comes from a Poisson distribution of rate parameter \(\lambda_{ij}\)

\[X_{ij} \sim \text{Poisson}(e^{a_j\theta_i + b_j})\]

  • \(\theta_i\) is the person’s latent fluency
  • \(a_j\) is the slope/discrimination/loading of the item (constant in the Rasch Poisson counts model)
  • \(b_j\) is the easiness of the item

Model

Model structure

TwoPPCM person Person (random effect) theta θ person->theta item Item (fixed effect) easiness b item->easiness slope a item->slope score Score (Poisson Count) theta->score easiness->score slope->score

Estimation

Primarily developed through maximum likelihood approaches

  • In Generalized SEM softare (e.g., Mplus, Stata)
    • Commercial, closed-source packages,
  • In dedicated R packages (e.g., countirt)
    • Risk of “abandonware”
  • In generalized mixed effects models (GLMM) software (e.g., lme4)
    • Does not accomodate variable discrimination parameters, not very flexible

Can a general purpose Bayesian estimation framework do better?

Our questions

  • Is it feasible to estimate log-linear count IRT models like the 2PPCM in a Bayesian framework with packages non-dedicated to count IRT?

  • Can we obtain results similar to maximum likelihood estimates by using weakly/non-informative priors?

  • Are the benefits of count IRT models limited or expanded by a Bayesian framework?

  • Are the benefits of a Bayesian framework (reasonably) attainable? Are they (possibly) useful in the context of divergent thinking tests?

  • Are there outstanding issues? New areas of development?

Markov Chain Monte Carlo estimation (in a nutshell)

Inputs

  • Model
  • Dataset
  • More or less vague plausible probabilty distributions for all parameters of the model (prior distributions)

Process

  • Sample from the model parameters’ probability distributions
  • Generate many plausible parameter values based on data & priors
  • Favor values that best fit the data

Output

  • Updated probability distributions for all parameters (posterior distribution)

Stan and brms

Stan (Carpenter et al., 2017): A programming language suited for Bayesian using Hamiltonian Monte Carlo (HMC) estimation.

  • Fast estimation, good convergence, flexibility of prior distributions and models

brms (Bürkner, 2017): An R package to estimate various models in Stan using regression-like syntax (e.g., y ~ x1 + x2)

  • Has been showed to accomodate logistic item-response models (Bürkner, 2020), more convenient than the Stan syntax

How convenient?

It’s not too bad ! See our paper (Myszkowski & Storme, 2025)

…but here’s a quick look !

Does it work?

Example dataset

  • Publicly available dataset for special issue (Forthmann et al., 2019)

  • 202 respondents (variable Person)

  • 3 alternate uses tasks (rope, paperclip, garbage bag) (variable Item)

  • Main analysis: 2PPCM with random person effect and fixed item effects (identified with variance standardization method)

Specifying the model

  • Translating the 2PPCM for brms
formula_2PPCM <- bf(
  Score ~ 0 + slope * theta + easiness, #linear part of the item response model
  theta ~ 0 + (1 | Person),             #Theta is a random effect of the person
  slope ~ 0 + Item,                     #Slope is a fixed effect of the item
  easiness ~ 0 + Item,                  #Easiness is a fixed effect of the item
  nl = TRUE,                            #Tells brms that we are doing weird non-linear things
  family = poisson(link = "log")        #Log-linear Poisson model
)

Estimation

fit_2PPCM <- brm(
  formula = formula_2PPCM,                #Passing the model formula
  data = data_long,                       #Passing the dataset
  prior = prior("constant(1)", 
                class = "sd", 
                group = "Person", 
                nlpar = "theta"),         #Identify model using constant priors
  iter = 2000, warmup = 500, chains = 4   #Technical options
  )

Results comparable with maximum likelihood

With non/weakly informative priors

Results comparable with maximum likelihood

With non/weakly informative priors

  • Factor scores

  • SE / Posterior uncertainty

We can do do all things (count) IRT !

  • Factor scores (point estimate, error, CI)
  • Item parameters (point estimate, error, CI)
  • Item response functions
  • Item and test information functions
  • Sample/person level reliability
  • Handling of missing data
  • Covariate-adjusted frequency plots
  • Calculate dispersion overall and by item

What are the advantages?

A more natural handling of uncertainty

Full posterior distributions of item parameters and ability (as opposed to points with standard errors).

Probabilistic conclusions about items

“Is item 1 easier than item 3?”

Probabilistic conclusions about persons

“Is person 1’s fluency more than 1 standard deviation higher than person 2’s fluency?”

Hierarchical structure for item parameters

Treat item characteristics (e.g. slope) as random deviations from a shared distribution.

formula_2PPCM_ri <- bf(
  Score ~ 0 + slope * theta + easiness,
  theta ~ 0 + (1 | Person),
  slope ~ 0 + (1 | Item),
  easiness ~ 0 + (1 | Item),
  nl = TRUE,                         
  family = poisson(link = "log")
)

Extensibility to explanatory models

Item covariates (e.g., object type):

formula_2PPCM_expl <- bf(
  Score ~ 0 + slope * theta + easiness,
  theta ~ 0 + (1 | Person),
  slope ~ 0 + Item,
  easiness ~ 0 + object_type,
  nl = TRUE,                         
  family = poisson(link = "log")
)

Person covariates (i.e. latent regression/latent mean differences)

formula_2PPCM_expl <- bf(
  Score ~ 0 + slope * theta + easiness,
  theta ~ 0 + training + (1 | Person),
  slope ~ 0 + Item,
  easiness ~ 0 + Item,
  nl = TRUE,                         
  family = poisson(link = "log")
)

Regularization

  • In maximum likelihood estimation, items with very high or low discrimination can cause instability in the model (especially in small samples).

  • By using informative priors (i.e. Bayesian regularization), we can avoid unrealistic parameter values and unstable models.

  • Applicable to extensions (e.g., avoiding differential item functionning false positives)

Incorporating model uncertainty

Rather than select one model with a set of priors, we can combine multiple models and/or multiple sets of priors using Bayesian averaging.

  • For example, we can obtain \(\theta\) posterior distributions from different models, which we average in proportion to the posterior model probabilities.

  • Avoids reliance on a single model, leading to more robust predictions.

What are the outstanding issues?

Convergence issues

Default priors were sufficient for the RPCM, but not for the 2PPCM.

  • In the paper, we explain how to choose weakly informative priors, which help. It’s also refined in my poster (on OSF repository).

Applicability in various contexts

  • Other divergent thinking datasets
  • Other count responses (e.g., counts of errors, verbal fluency)
  • Item-dependent dispersion parameter (Forthmann et al., 2019)
  • Multidimensional applications
  • Tests with both count responses and non-count responses

Thank you !

Find this presentation at https://osf.io/9f4eu/

References

Bürkner, P.-C. (2017). brms: An r package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Bürkner, P.-C. (2020). Analysing standard progressive matrices (SPM–LS) with Bayesian item response models. Journal of Intelligence, 8(1), 5. https://doi.org/10.3390/jintelligence8010005
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76, 1–32. https://doi.org/10.18637/jss.v076.i01
Forthmann, B., Gühne, D., & Doebler, P. (2019). Revisiting dispersion in count data item response theory models: The ConwayMaxwellPoisson counts model. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12184
Myszkowski, N., & Storme, M. (2021). Accounting for variable task discrimination in divergent thinking fluency measurement: An example of the benefits of a 2-parameter Poisson counts model and its bifactor extension over the Rasch Poisson counts model. The Journal of Creative Behavior, 55(3), 800–818. https://doi.org/10.1002/jocb.490
Myszkowski, N., & Storme, M. (2025). Bayesian Estimation of Generalized Log-Linear Poisson Item Response Models for Fluency Scores Using brms and Stan. Journal of Intelligence, 13(3), 26. https://doi.org/10.3390/jintelligence13030026